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INTRODUCTION 


he power spectrum analysis of time series data is 
a powerful tool for SONAR, RADAR, and speech 


processing. In many application areas such as 
noise analysis, design of digital filters, and signal track- 
ing, spectrum analysis is a common and convenient 
technique [1]. At the heart of most spectrum estimation 
applications is the Fast Fourier Transform (FFT) [2]. The 
FPS-5000 Series array processor obtains high perfor- 
mance on FFTs and other signal processing algorithms. 
This article describes the manner in which the FPS-5410 
array processor is used on a simple peak spectral energy 
tracking problem to achieve almost 3.9 times the 
performance of an FPS-100. 


FPS-5000 SERIES ARCHITECTURE 


The FPS-5000 Series family is based on a distributed 
signal processing system concept. High throughput pro- 
cessing requirements are achieved by distributing 
various sub-tasks onto individual system elements. The 
FPS-5000 Series array processor (shown in Figure 1) 
includes a Control Processor used for host process com- 
munication and control, a large system common 
memory, I/O Coprocessors, and up to three Arithmetic 
Coprocessors. Key elements of the FPS-5000 Series used 
in this application include the Arithmetic Coprocessor, 
the GPIOP I/O Coprocessor, and the System Common 
Memory, all of which are described in the following 
sections. 
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Figure 1. FPS-5410 system diagram. 


ARITHMETIC COPROCESSOR 
ARCHITECTURE 


The Arithmetic Coprocessor is the key architectural 
element, providing high performance in the FPS-5000 
Series. Multiple Arithmetic Coprocessors may be con- 
figured in a distributed processing system managed by 
the Control Processor. Shared access to the System 
Common Memory provides a common data base and 
communications path between processors. Shown in 
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Figure 2, the Arithmetic Coprocessor is a specialized 
array processor with internal architecture optimized 
toward the FFT and complex vector computations [3]. 
Operating on a synchronous, 6-MHz clock, the Arith- 
metic Coprocessor includes one floating-point multiplier 
and two floating-point adders with 32-bits of precision. 
The internal memory has sufficient speed to allow two 
memory operations by the Arithmetic Coprocessor and 
access by the DMA controller on each cycle, with no 
impact on processor performance. 
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Figure 2. Arithmetic Coprocessor architecture. 
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GPIOP I/O COPROCESSO 

> ARCHITECTURE 
The GPIOP (General-purpose Programmable I/O 
Processor] is an interface processor designed to provide 
a flexible high-speed path into the FPS-5000 Series 
array processor from external devices (see Figure 3). 
The GPIOP includes two processing elements: a 20-bit 
wide bit-slice processor used for address calculations 
and device protocol, and a format processor for fix/float 
and pack/unpack operations. The format processor 
(FPROC) is supplied with over 55 format conversion 
routines that provide on-the-fly conversion between 
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many popular data formats. The GPIOP is a powerful 
device that is easily adapted for connection to A/D and 
D/A converters, tape drives, disks, bulk memory, 
display systems, and real-time control equipment. 


FINDING THE PEAK POWER 
FREQUENCY 


As an example of power spectrum calculations, consider 
the task of estimating the frequency with the highest 
received power within a frame of 1024 real input data 
points on a continuous basis. Figure 4 shows the data- 
flow organization, and the processing resources of the 
FPS-5000 Series used to implement this application. 
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Figure 3. GPIOP Architecture. 
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Figure 4. Data flow organization. 


The signal is sampled by an A/D converter under direct 
control of the GPIOP. In this example the GPIOP man- 
ages double 1K input buffers in System Common 
Memory. The Arithmetic Coprocessor then transfers 
data from these buffers to its local memory. The Arith- 
metic Coprocessor architecture performs a windowing 
> operation (Hamming) and a forward real FFT [4]. The 
complex spectrum is moved out of the Arithmetic Co- 


processor and back into System Common Memory 
where the Control Processor performs power conver- 
sion and peak detection. The resulting frequency 
pointer is passed to the host for analysis or display, and 
the entire process is repeated for every 1K block of 
input samples. Because of the highly parallel architec- 
ture of the FPS-5000 Series array processor, most of the 
data transfers and control of each processing element 
are overlapped with computation. 
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IMPLEMENTATION 


The power spectrum and maximum search operations 
are easily performed using standard FPS-5000 Series 
math library routines. Analysis shows that by estimat- 
ing the performance of appropriate routines, timing 
predictions can be made for this power spectrum 
analysis application. The entire data collection phase is 
controlled by the programmable GPIOP, which executes 
interface protocol with the A/D converter and managed 
buffers and flags in the System Common Memory. 


The Arithmetic Coprocessor is able to fully overlap 
DMA and channel list interpretation with arithmetic 
processing as long as calculations require more time 
than control and DMAs. Similarly the Control Processor 
performs arithmetic processing as well as overall 
process control. In this example the Control Processor's 
arithmetic operations and control overhead are together 
less the Arithmetic Coprocessor processing time, and 
are completely overlapped, using double buffers. There- 
fore the total frame time is limited to the VMUL and 
RFFT operations in the Arithmetic Coprocessor. In 
Figure 5, the model FPS-5410 is shown having a con- 
figuration that includes 256K of system command 
memory and one Arithmetic Coprocessor. 


An equivalent implementation in which an FPS-100 
executes all of the processing is shown below for com- 
parison. For the FPS-100 the frame time is equivalent to 
the arithmetic process time because all buffers are held 
in Main Data memory and the GPIOP manages all 
input buffers. 
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Figure 5. Performance Analysis 


The Arithmetic Coprocessor processing element con- 
tained in the FPS-5410 is designed to include independent 
data transfer and arithmetic processing sections, thus 
allowing concurrent loading/unloading and processing 
with no time penalty. The timing chart for the maxi- 
mum power spectrum computation is shown in Figure 6. 
As shown, all data transfer operations are overlapped 
with the spectral calculations. Vector multiply is used 
for the window operator rather than an integrated 
Hamming routine because it yields better performance. 
The Arithmetic Coprocessor contains 16K of data 
memory which is more than enough memory to hold 
the extra coefficient vector. oO 
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Figure 6. System timing chart. 


SUMMARY 


The power and flexibility of a distributed processing 
system is clearly illustrated by the peak power fre- 
quency calculation. As shown above the FPS-5410 with 
one Arithmetic Coprocessor achieves almost 3.9 times 


the performance of currently available array processors. 


The availability of more specialized signal processing 
routines and application development tools make the 
FPS-5000 Series array processor a powerful tool for 
new system designs. 
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DEVIATIONS FROM STANDARD 
RELEASE 


1.1 AP-FORTRAN 


APFTN38 is available as optional software. 


1.2 Chained APEX (CAPEX} 


Chained APEX (CAPEX) processing is provided. If 
reading or writing extended memory (page select} 
registers in chain mode, the channel program is 
forced to execute with a call to APEXC. There is a 
400 CCW limit in one channel program. If this 
limit is reached, the channel program is forced to 
execute with a call to APEXC. 


1.3. Control-bit 5 Interrupts 


Control-bit 5 interrupts are not supported. The 
related APEX routines call APSTOP. 


1.4 DMA Overlap 


An FPS-defined DMA is equivalent to an IBM 
channel program. In step mode a DMA operation 
can be started while the AP is running. Control 
will return to the user before completion of the 
DMA. Any subsequent access request will be sus- 
pended until the DMA completes. In chain mode a 
DMA can be performed while the AP is running. 


The AP cannot be started running while a DMA is 
in progress. 


1.5 Format Registers 


AP and host access to the format (high and low) 
registers and the IFSTAT register is not supported. 


1.6 HMA and WC Registers (AP) 


Access to the HMA and WC registers from within 
the AP is not supported. 


1.7 HMA and WC Registers (Host) 


Access to the HMA and WC registers from the host 
is simulated in software. These values are never 
actually written to or read from the AP. 


